Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games

نویسنده

  • Natalia Akchurina
چکیده

Reinforcement learning turned out a technique that allowed robots to ride a bicycle, computers to play backgammon on the level of human world masters and solve such complicated tasks of high dimensionality as elevator dispatching. Can it come to rescue in the next generation of challenging problems like playing football or bidding on virtual markets? Reinforcement learning that provides a way of programming agents without specifying how the task is to be achieved could be again of use here but the convergence of reinforcement learning algorithms to optimal policies is only guaranteed under the conditions of stationarity of the environment that is violated in multiagent systems. For reinforcement learning in multiagent environments general-sum discounted stochastic games become a formal framework instead of Markov decision processes. Also the optimal polCite as: Multiagent Reinforcement Learning Algorithm Converging to Nash Equilibrium, Natalia Akchurina,Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), Decker, Sichman, Sierra and Castelfranchi (eds.), May, 10–15, 2009, Budapest, Hungary, pp. XXX-XXX. Copyright c © 2009, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. icy concept in multiagent systems is different — we can’t speak anymore about optimal policy (policy that provides the maximum cumulative reward) without taking into account the policies of other agents that influence our payoffs. In the environment where every agent tries to maximize its cumulative reward it is the most natural to accept Nash equilibrium as the optimal solution concept. In Nash equilibrium each agent’s policy is the best response to the other agents’ policies. Thus no agent can gain from unilateral deviation. A number of algorithms [1, 2, 4, 5, 6, 7, 8, 9] were proposed to extend reinforcement learning approach to multiagent systems. The convergence to Nash equilibria was proved for very restricted class of environments: strictly competitive [6], strictly cooperative [2, 7] and 2-agent 2action iterative game [1]. In [5] convergence to Nash equilibrium has been achieved in self-play for strictly competitive and strictly cooperative games under additional very restrictive condition that all equilibria encountered during learning stage are unique [7]. In this paper we propose a reinforcement learning algorithm that converges to Nash equilibria with some given accuracy in general-sum discounted stochastic games and prove it formally under some assumptions. We claim that it is the first algorithm that finds Nash equilibrium for the general case. The paper is organized as follows. In section 2 we present formal definitions of stochastic game, Nash equilibrium, as well as prove some theorems that we will need for equilibrium approximation theorem in section 3. Section 3 is devoted to discussion and necessary experimental estimations of the conditions of the equilibrium approximation theorem. In sections 5 and 6 the developed algorithm Nash-DE and analysis of the results of experiments are presented correspondingly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergence Problems of General-Sum Multiagent Reinforcement Learning

Stochastic games are a generalization of MDPs to multiple agents, and can be used as a framework for investigating multiagent learning. Hu and Wellman (1998) recently proposed a multiagent Q-learning method for general-sum stochastic games. In addition to describing the algorithm, they provide a proof that the method will converge to a Nash equilibrium for the game under specified conditions. T...

متن کامل

Multiagent Reinforcement Learning in Stochastic Games

We adopt stochastic games as a general framework for dynamic noncooperative systems. This framework provides a way of describing the dynamic interactions of agents in terms of individuals' Markov decision processes. By studying this framework, we go beyond the common practice in the study of learning in games, which primarily focus on repeated games or extensive-form games. For stochastic games...

متن کامل

Friend-or-Foe Q-learning in General-Sum Games

This paper describes an approach to reinforcement learning in multiagent general-sum games in which a learner is told to treat each other agent as either a \friend" or \foe". This Q-learning-style algorithm provides strong convergence guarantees compared to an existing Nash-equilibrium-based learning rule.

متن کامل

Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

In this paper we adopt general sum stochas tic games as a framework for multiagent re inforcement learning Our work extends pre vious work by Littman on zero sum stochas tic games to a broader framework We de sign a multiagent Q learning method under this framework and prove that it converges to a Nash equilibrium under speci ed condi tions This algorithm is useful for nding the optimal strateg...

متن کامل

Learning in Markov Games with Incomplete Information

The Markov game (also called stochastic game (Filar & Vrieze 1997)) has been adopted as a theoretical framework for multiagent reinforcement learning (Littman 1994). In a Markov game, there are n agents, each facing a Markov decision process (MDP). All agents’ MDPs are correlated through their reward functions and the state transition function. As Markov decision process provides a theoretical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009